153 research outputs found

    Towards a biodiversity knowledge graph

    Get PDF
    One way to think about "core" biodiversity data is as a network of connected entities, such as taxa, taxonomic names, publications, people, species, sequences, images, and collections that form the "biodiversity knowledge graph". Many questions in biodiversity informatics can be framed as paths in this graph. This article explores this futher, and sketches a set of services and tools we would need in order to construct the graph

    Surfacing the deep data of taxonomy

    Get PDF
    Taxonomic databases are perpetuating approaches to citing literature that may have been appropriate before the Internet, often being little more than digitised 5 Ɨ 3 index cards. Typically the original taxonomic literature is either not cited, or is represented in the form of a (typically abbreviated) text string. Hence much of the ā€œdeep dataā€ of taxonomy, such as the original descriptions, revisions, and nomenclatural actions are largely hidden from all but the most resourceful users. At the same time there are burgeoning efforts to digitise the scientific literature, and much of this newly available content has been assigned globally unique identifiers such as Digital Object Identifiers (DOIs), which are also the identifier of choice for most modern publications. This represents an opportunity for taxonomic databases to engage with digitisation efforts. Mapping the taxonomic literature on to globally unique identifiers can be time consuming, but need be done only once. Furthermore, if we reuse existing identifiers, rather than mint our own, we can start to build the links between the diverse data that are needed to support the kinds of inference which biodiversity informatics aspires to support. Until this practice becomes widespread, the taxonomic literature will remain balkanized, and much of the knowledge that it contains will linger in obscurity

    Finding scientific articles in a large digital archive: BioStor and the Biodiversity Heritage Library

    Get PDF
    The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article finding service is exposed as a standard OpenURL resolver on the BioStor web site "http://biostor.org/openurl/":http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from "http://biostor.org/":http://biostor.org/

    Phyloinformatics in the age of Wikipedia

    Get PDF
    This talk describes a mapping between the NCBI taxonomy database and Wikipedia. These two databases were chosen because the NCBI taxonomy contains all the taxa for which sequences are publicly available, and for many taxa Wikipedia is the first site returned in a Google search on that taxon's scientific name. The NCBI web pages for nearly 53,000 NCBI taxa now have a link to the corresponding page in Wikipedia

    Towards a Taxonomically Intelligent Phylogenetic Database

    Get PDF
    This note outlines some of the key intellectual obstacles that stand in the way of creating a usable phylogenetic database. These challenges include the need to accommodate multiple taxonomic names and classifications, and the need for tools to query trees in biologically meaningful ways. Until these problems are addressed, and a taxonomically intelligent phylogenetic database created, much of our phylogenetic knowledge will languish in the pages of journals

    Liberating links between datasets using lightweight data publishing: an example using plant names and the taxonomic literature

    Get PDF
    Constructing a biodiversity knowledge graph will require making millions of cross links between diversity entities in different datasets. Researchers trying to bootstrap the growth of the biodiversity knowledge graph by constructing databases of links between these entities lack obvious ways to publish these sets of links. One appealing and lightweight approach is to create a "datasette", a database that is wrapped together with a simple web server that enables users to query the data. Datasettes can be packaged into Docker containers and hosted online with minimal effort. This approach is illustrated using a dataset of links between globally unique identifiers for plant taxonomic namesand identifiers for the taxonomic articles that published those names

    DNA barcoding and taxonomy: dark taxa and dark texts

    Get PDF
    Both classical taxonomy and DNA barcoding are engaged in the task of digitizing the living world. Much of the taxonomic literature remains undigitized. The rise of open access publishing this century and the freeing of older literature from the shackles of copyright have greatly increased the online availability of taxonomic descriptions, but much of the literature of the mid- to late-twentieth century remains offline (ā€˜dark textsā€™). DNA barcoding is generating a wealth of computable data that in many ways are much easier to work with than classical taxonomic descriptions, but many of the sequences are not identified to species level. These ā€˜dark taxaā€™ hamper the classical method of integrating biodiversity data, using shared taxonomic names. Voucher specimens are a potential common currency of both the taxonomic literature and sequence databases, and could be used to help link names, literature and sequences. An obstacle to this approach is the lack of stable, resolvable specimen identifiers. The paper concludes with an appeal for a global ā€˜digital dashboardā€™ to assess the extent to which biodiversity data are available online. This article is part of the themed issue ā€˜From DNA barcodes to biomesā€™

    Towards an Open Taxonomy

    Get PDF
    Taxonomy is in many ways still predigital. Most taxonomic databases are little more than digitized index cards linking names to often-cryptic bibliographic citations, oblivious to the growing volume of scientific literature that is now online. A growing fraction of taxonomic literature is becoming freely available, either through adoption of Open Access publishing models, or through digitizing efforts such as the Biodiversity Heritage Library. Yet much of the most basic information about biodiversity, namely taxonomic description, remains either behind a pay wall, or only available in paper form. This talk sketches the goal of an "Open Taxonomy." The first step towards this goal is digitally linking scientific names to the primary literature using standard identifiers such as DOIs. I argue that until we make serious inroads into this task, taxonomic knowledge will remain in a ghetto largely ignored by the wider scientific community

    A Taxonomic Search Engine: Federating taxonomic databases using web services

    Get PDF
    BACKGROUND: The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. RESULTS: The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. CONCLUSION: The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names

    Treemap Versus BPA (Again): A Response to Dowling

    Get PDF
    TreeMap is a computer program for analysing host-parasite cospeciation. We respond to Dowling’s (Cladistics, 18: 416-435) recent comparison of TreeMap and Brooks Parsimony Analysis (BPA) by showing that Dowling’s comparison suffers from several mistakes and flaws. We discuss the problems with both BPA and TreeMap, and show that BPA incorrectly counts the true number coevolutionary events more often than TreeMap 1. We also discuss the two main limitations of TreeMap 1 correctly identified by Dowling, namely its inability to handle widespread parasites, and its coarse optimality criterion (the number of cospeciation events). We suggest a simple fix for widespread parasites. The newly released TreeMap 2 uses a more sensitive optimality criterion than TreeMap 1, addressing Dowling’s second concern
    • ā€¦
    corecore